AITopics | City of Cebu

Collaborating Authors

City of Cebu

Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing SystemsFeb-18-2026, 04:02:11 GMT

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(22 more...)

Industry:

Education > Curriculum > Subject-Specific Education (0.96)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

feb34ce77fc8b94c85d12e608b23ce67-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 12:52:44 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(22 more...)

Industry:

Health & Medicine (0.69)
Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Semantic Decomposition and Selective Context Filtering -- Text Processing Techniques for Context-Aware NLP-Based Systems

Villardar, Karl John

arXiv.org Artificial IntelligenceFeb-19-2025

In this paper, we present two techniques for use in context-aware systems: Semantic Decomposition, which sequentially decomposes input prompts into a structured and hierarchal information schema in which systems can parse and process easily, and Selective Context Filtering, which enables systems to systematically filter out specific irrelevant sections of contextual information that is fed through a system's NLP-based pipeline. We will explore how context-aware systems and applications can utilize these two techniques in order to implement dynamic LLM-to-system interfaces, improve an LLM's ability to generate more contextually cohesive user-facing responses, and optimize complex automated workflows and pipelines.

arxiv, information, language model, (14 more...)

arXiv.org Artificial Intelligence

2502.14048

Country:

Asia > South Korea (0.04)
Asia > Philippines > Visayas > Central Visayas > Province of Cebu > City of Cebu (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness

Chen, Yiyi, Bjerva, Johannes

arXiv.org Artificial IntelligenceJun-5-2023

Colexification refers to the linguistic phenomenon where a single lexical form is used to convey multiple meanings. By studying cross-lingual colexifications, researchers have gained valuable insights into fields such as psycholinguistics and cognitive sciences [Jackson et al.,2019]. While several multilingual colexification datasets exist, there is untapped potential in using this information to bootstrap datasets across such semantic features. In this paper, we aim to demonstrate how colexifications can be leveraged to create such cross-lingual datasets. We showcase curation procedures which result in a dataset covering 142 languages across 21 language families across the world. The dataset includes ratings of concreteness and affectiveness, mapped with phonemes and phonological features. We further analyze the dataset along different dimensions to demonstrate potential of the proposed procedures in facilitating further interdisciplinary research in psychology, cognitive science, and multilingual natural language processing (NLP). Based on initial investigations, we observe that i) colexifications that are closer in concreteness/affectiveness are more likely to colexify; ii) certain initial/last phonemes are significantly correlated with concreteness/affectiveness intra language families, such as /k/ as the initial phoneme in both Turkic and Tai-Kadai correlated with concreteness, and /p/ in Dravidian and Sino-Tibetan correlated with Valence; iii) the type-to-token ratio (TTR) of phonemes are positively correlated with concreteness across several language families, while the length of phoneme segments are negatively correlated with concreteness; iv) certain phonological features are negatively correlated with concreteness across languages. The dataset is made public online for further research.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.02646

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
(19 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Banking & Finance > Credit (0.34)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering

Rani, Anku, Tonmoy, S. M Towhidul Islam, Dalal, Dwip, Gautam, Shreya, Chakraborty, Megha, Chadha, Aman, Sheth, Amit, Das, Amitava

arXiv.org Artificial IntelligenceMay-28-2023

Automatic fact verification has received significant attention recently. Contemporary automatic fact-checking systems focus on estimating truthfulness using numerical scores which are not human-interpretable. A human fact-checker generally follows several logical steps to verify a verisimilitude claim and conclude whether its truthful or a mere masquerade. Popular fact-checking websites follow a common structure for fact categorization such as half true, half false, false, pants on fire, etc. Therefore, it is necessary to have an aspect-based (delineating which part(s) are true and which are false) explainable system that can assist human fact-checkers in asking relevant questions related to a fact, which can then be validated separately to reach a final verdict. In this paper, we propose a 5W framework (who, what, when, where, and why) for question-answer-based fact explainability. To that end, we present a semi-automatically generated dataset called FACTIFY-5WQA, which consists of 391, 041 facts along with relevant 5W QAs - underscoring our major contribution to this paper. A semantic role labeling system has been utilized to locate 5Ws, which generates QA pairs for claims using a masked language model. Finally, we report a baseline QA system to automatically locate those answers from evidence documents, which can serve as a baseline for future research in the field. Lastly, we propose a robust fact verification system that takes paraphrased claims and automatically validates them. The dataset and the baseline model are available at https: //github.com/ankuranii/acl-5W-QA

covid-19 vaccine, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.04329

Country:

North America > United States > Virginia (0.05)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
Asia > China > Hubei Province > Wuhan (0.05)
(15 more...)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Automated Identification of Disaster News For Crisis Management Using Machine Learning

Regacho, Lord Christian Carl H., Matsushita, Ai, Ceniza-Canillo, Angie M.

arXiv.org Artificial IntelligenceJan-24-2023

A lot of news sources picked up on Typhoon Rai (also known locally as Typhoon Odette), along with fake news outlets. The study honed in on the issue, to create a model that can identify between legitimate and illegitimate news articles. With this in mind, we chose the following machine learning algorithms in our development: Logistic Regression, Random Forest and Multinomial Naive Bayes. Bag of Words, TF-IDF and Lemmatization were implemented in the Model. Gathering 160 datasets from legitimate and illegitimate sources, the machine learning was trained and tested. By combining all the machine learning techniques, the Combined BOW model was able to reach an accuracy of 91.07%, precision of 88.33%, recall of 94.64%, and F1 score of 91.38% and Combined TF-IDF model was able to reach an accuracy of 91.18%, precision of 86.89%, recall of 94.64%, and F1 score of 90.60%.

information retrieval, machine learning, news article, (19 more...)

arXiv.org Artificial Intelligence

2301.09896

Country:

Asia > Philippines > Visayas > Central Visayas > Province of Cebu > City of Cebu (0.05)
Europe > Andorra > Canillo > Canillo (0.05)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.39)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)
(3 more...)

Add feedback

Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

Aquino, Angelina, de Leon, Franz

arXiv.org Artificial IntelligenceJan-5-2023

The grammatical analysis of texts in any written language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of alternative language resources for creating task-specific models in the absence of dependency-annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.

artificial intelligence, natural language, pipeline, (18 more...)

arXiv.org Artificial Intelligence

2208.01814

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.15)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(16 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

Mohtaj, Salar, Tavakkoli, Fatemeh, Asghari, Habibollah

arXiv.org Artificial IntelligenceJan-17-2022

In this paper we introduce PerPaDa, a Persian paraphrase dataset that is collected from users' input in a plagiarism detection system. As an implicit crowdsourcing experience, we have gathered a large collection of original and paraphrased sentences from Hamtajoo; a Persian plagiarism detection system, in which users try to conceal cases of text re-use in their documents by paraphrasing and re-submitting manuscripts for analysis. The compiled dataset contains 2446 instances of paraphrasing. In order to improve the overall quality of the collected data, some heuristics have been used to exclude sentences that don't meet the proposed criteria. The introduced corpus is much larger than the available datasets for the task of paraphrase identification in Persian. Moreover, there is less bias in the data compared to the similar datasets, since the users did not try some fixed predefined rules in order to generate similar texts to their original inputs.

dataset, paraphrased sentence, perpada, (11 more...)

arXiv.org Artificial Intelligence

2201.06573

Country:

Europe > Germany > Berlin (0.05)
Asia > India > West Bengal > Kolkata (0.04)
Asia > China > Beijing > Beijing (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automatic Language Identification in Texts: A Survey

Jauhiainen, Tommi, Lui, Marco, Zampieri, Marcos, Baldwin, Timothy, Lindén, Krister

Journal of Artificial Intelligence ResearchAug-25-2019

Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

pattern recognition association, text-based language identification, word-level language identification, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11675

AI Access Foundation

11675

Journal of Artificial Intelligence Research

Country: